With the development of natural language processing techniques(NLP), automatic diagnosis of eye diseases using ophthalmology electronic medical records (OEMR) has become possible. It aims to evaluate the condition of both eyes of a patient respectively, and we formulate it as a particular multi-label classification task in this paper. Although there are a few related studies in other diseases, automatic diagnosis of eye diseases exhibits unique characteristics. First, descriptions of both eyes are mixed up in OEMR documents, with both free text and templated asymptomatic descriptions, resulting in sparsity and clutter of information. Second, OEMR documents contain multiple parts of descriptions and have long document lengths. Third, it is critical to provide explainability to the disease diagnosis model. To overcome those challenges, we present an effective automatic eye disease diagnosis framework, NEEDED. In this framework, a preprocessing module is integrated to improve the density and quality of information. Then, we design a hierarchical transformer structure for learning the contextualized representations of each sentence in the OEMR document. For the diagnosis part, we propose an attention-based predictor that enables traceable diagnosis by obtaining disease-specific information. Experiments on the real dataset and comparison with several baseline models show the advantage and explainability of our framework.
translated by 谷歌翻译
Channel and spatial attention mechanism has proven to provide an evident performance boost of deep convolution neural networks (CNNs). Most existing methods focus on one or run them parallel (series), neglecting the collaboration between the two attentions. In order to better establish the feature interaction between the two types of attention, we propose a plug-and-play attention module, which we term "CAT"-activating the Collaboration between spatial and channel Attentions based on learned Traits. Specifically, we represent traits as trainable coefficients (i.e., colla-factors) to adaptively combine contributions of different attention modules to fit different image hierarchies and tasks better. Moreover, we propose the global entropy pooling (GEP) apart from global average pooling (GAP) and global maximum pooling (GMP) operators, an effective component in suppressing noise signals by measuring the information disorder of feature maps. We introduce a three-way pooling operation into attention modules and apply the adaptive mechanism to fuse their outcomes. Extensive experiments on MS COCO, Pascal-VOC, Cifar-100, and ImageNet show that our CAT outperforms existing state-of-the-art attention mechanisms in object detection, instance segmentation, and image classification. The model and code will be released soon.
translated by 谷歌翻译
非刚性注册以非刚性方式与目标形状保持一致的源形状变形,是计算机视觉中的经典问题。由于数据(噪声,离群值和部分重叠)和高度自由度,因此此类问题可能具有挑战性。现有方法通常采用$ \ ell_ {p} $键入鲁棒标准来测量对齐误差并规范变形的平滑度,并使用近端算法来解决所得的非平滑优化问题。但是,这种算法的缓慢收敛性限制了其广泛的应用。在本文中,我们提出了一种基于全球平稳的稳健标准进行对齐和正则化的稳健非刚性登记的公式,该规范可以有效地处理异常值和部分重叠。使用大型最小化算法解决了该问题,该算法将每次迭代减少到使用封闭形式的解决方案的凸二次问题。我们进一步应用安德森加速度以加快求解器的收敛性,使求解器能够在具有有限的计算能力的设备上有效运行。广泛的实验证明了我们方法在两种形状之间具有异常值和部分重叠的形状之间的非刚性比对的有效性,并进行定量评估表明,就注册准确性和计算速度而言,它的表现优于最先进的方法。源代码可从https://github.com/yaoyx689/amm_nrr获得。
translated by 谷歌翻译
在本文中,我们研究了为给定图像生成高质量视觉文本演示设计的图形布局生成问题。我们注意到,不仅包含全局语义和空间信息的图像组成在很大程度上会影响布局结果。因此,我们提出了一个深层生成模型,称为组成感知图形布局GAN(CGL-GAN),以基于输入图像的全局和空间视觉内容来合成布局。为了从已经包含手动设计的图形布局数据的图像中获取训练图像,先前的工作建议将设计元素(例如文本和点缀)作为模型输入,这不可避免地会留下地面真相的提示。我们研究训练输入(带有提示掩码)和测试输入(没有掩模)之间的错位,并设计一个新型的域比对模块(DAM)以缩小此间隙。为了培训,我们构建了一个大规模布局数据集,该数据集由60,548张广告海报组成,并带有带注释的布局信息。为了评估生成的布局,我们根据美学直觉提出了三个新型指标。通过定量和定性评估,我们证明了所提出的模型可以根据图像组成合成高质量的图形布局。
translated by 谷歌翻译
双链DNA断裂(DSB)是一种DNA损伤的形式,可导致异常染色体重排。基于高吞吐量实验的最近技术具有明显的高成本和技术挑战。因此,我们设计了一种基于图形的神经网络的方法来预测DSB(GraphDSB),使用DNA序列特征和染色体结构信息。为了提高模型的表达能力,我们引入跳跃知识架构和几种有效的结构编码方法。结构信息对DSB预测的贡献是通过来自正常人体表皮角蛋白细胞(NHEK)和慢性髓性白血病细胞系(K562)的数据集的实验验证,并且消融研究进一步证明了所提出的设计部件的有效性GraphDSB框架。最后,我们使用GNNExplainer分析节点特征和拓扑到DSB预测的贡献,并证明了5-MER DNA序列特征和两种染色质相互作用模式的高贡献。
translated by 谷歌翻译
我们提出寻找^ {\ PI},这是一种新的神经重新渲染方法,其目的是(1)实时提高人类性能捕获系统的低质量重建结果的渲染质量; (2)改善神经翻译网络对看不见的人的泛化能力。我们的主要思想是利用重建几何形象的渲染图像作为帮助预测来自少数参考图像的人特定细节的指导,从而增强重新呈现的结果。鉴于此,我们设计了一个双分支网络。粗略分支旨在修复一些工件(即孔,噪声)并获得渲染输入的粗版本,而详细分支旨在预测来自翘曲的参考的“正确”细节。通过在细节分支的训练中有效地混合来自两个分支的特征来实现渲染图像的指导,这提高了翘曲准确性和细节的保真度。我们展示了我们的方法优于在看不见者上生产高保真图像的最先进的方法。
translated by 谷歌翻译
虽然3D人类重建方法使用像素对齐的隐式功能(PIFU)开发快速,但我们观察到重建细节的质量仍然不令人满意。扁平的面部表面经常发生在基于PIFU的重建结果中。为此,我们提出了一个双重PIFU表示,以提高重建的面部细节的质量。具体地,我们利用两只MLP分别代表面部和人体的PIFU。专用于三维面重建的MLP可以提高网络容量,并降低面部细节重建的难度,如前一级PIFU表示。要解决拓扑错误,我们利用3个RGBD传感器捕获多视图RGBD数据作为网络的输入,稀疏,轻量级捕获设置。由于深度噪声严重影响重建结果,我们设计深度细化模块,以减少输入RGB图像的引导下的原始深度的噪声。我们还提出了一种自适应融合方案来熔化身体的预测占用场和面部的预测占用场,以消除其边界处的不连续性伪影。实验证明了我们在重建生动的面部细节和变形体形状方面的效果,并验证了其优于最先进的方法。
translated by 谷歌翻译
基于Xornet的低功耗控制器是一种流行的技术,可以减少基于扫描的测试中的电路过渡。然而,现有解决方案构造Xordet均匀用于扫描链控制,并且可能导致次优溶液而没有任何设计指导。在本文中,我们提出了一种具有进化学习的新型可测试性感知的低功率控制器。从所提出的遗传算法(GA)产生的XorNET可以根据其使用,使扫描链的自适应控制能够显着提高XorNET编码容量,从而减少了ATPG的故障情况的数量和降低测试数据量。实验结果表明,在相同的控制比特下,我们的GA引导的Xornet设计可以将故障覆盖率提高至2.11%。所提出的GA引导的XorNET还允许降低控制比特的数量,并且总测试时间平均降低20.78%,与现有设计相比,在不牺牲测试覆盖的情况下相比,相比,高达47.09%。
translated by 谷歌翻译
Pre-trained language models (LMs) store knowledge in their parameters and can generate informative responses when used in conversational systems. However, LMs suffer from the problem of "hallucination:" they may generate plausible-looking statements that are irrelevant or factually incorrect. To address this problem, we propose a contrastive learning scheme, named MixCL. A novel mixed contrastive objective is proposed to explicitly optimize the implicit knowledge elicitation process of LMs, and thus reduce their hallucination in conversations. We also examine negative sampling strategies of retrieved hard negatives and model-generated negatives. We conduct experiments on Wizard-of-Wikipedia, a public, open-domain knowledge-grounded dialogue benchmark, and assess the effectiveness of MixCL. MixCL effectively reduces the hallucination of LMs in conversations and achieves the highest performance among LM-based dialogue agents in terms of relevancy and factuality. We show that MixCL achieves comparable performance to state-of-the-art KB-based approaches while enjoying notable advantages in terms of efficiency and scalability.
translated by 谷歌翻译
Effective data imputation demands rich latent ``structure" discovery capabilities from ``plain" tabular data. Recent advances in graph neural networks-based data imputation solutions show their strong structure learning potential by directly translating tabular data as bipartite graphs. However, due to a lack of relations between samples, those solutions treat all samples equally which is against one important observation: ``similar sample should give more information about missing values." This paper presents a novel Iterative graph Generation and Reconstruction framework for Missing data imputation(IGRM). Instead of treating all samples equally, we introduce the concept: ``friend networks" to represent different relations among samples. To generate an accurate friend network with missing data, an end-to-end friend network reconstruction solution is designed to allow for continuous friend network optimization during imputation learning. The representation of the optimized friend network, in turn, is used to further optimize the data imputation process with differentiated message passing. Experiment results on eight benchmark datasets show that IGRM yields 39.13% lower mean absolute error compared with nine baselines and 9.04% lower than the second-best.
translated by 谷歌翻译